Query Segmentation and Resource Disambiguation Leveraging Background Knowledge

نویسندگان

  • Saeedeh Shekarpour
  • Axel-Cyrille Ngonga Ngomo
  • Sören Auer
چکیده

Accessing the wealth of structured data available on the Data Web is still a key challenge for lay users. Keyword search is the most convenient way for users to access information (e.g., from data repositories). In this paper we introduce a novel approach for determining the correct resources for user-supplied keyword queries based on a hidden Markov model. In our approach the user-supplied query is modeled as the observed data and the background knowledge is used for parameter estimation. Instead of learning parameter estimation from training data, we leverage the semantic relationships between data items for computing the parameter estimations. In order to maximize accuracy and usability, query segmentation and resource disambiguation are mutually tightly interwoven. First, an initial set of potential segmentations is obtained leveraging the underlying knowledge base; then the final correct set of segments is determined after the most likely resource mapping was computed using a scoring function. While linguistic methods like named entity, multi-word unit recognition and POS-tagging fail in the case of an incomplete sentences (e.g. for keyword-based queries), we will show that our statistical approach is robust with regard to query expression variance. Our experimental results when employing the hidden Markov model for resource identification in keyword queries reveal very promising results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preliminary Lexical Framework For English-Arabic Semantic Resource Construction

This paper describes preliminary work concerning the creation of a Framework to aid in lexical semantic resource construction. The Framework consists of 9 stages during which various lexical resources are collected, studied, and combined into a single combinatory lexical resource. To evaluate the general Framework it was applied to a small set of English and Arabic resources, automatically comb...

متن کامل

Knowledge-based and vertical-driven information retrieval

The paper introduces the architecture and functionality of the knowledge-based information retrieval technology developed at Vertical Search Works. A large-scale language-independent ontology is used during indexing, query analysis, and document retrieval as part of a web-scale vertical search engine. Three specific areas are examined: the knowledge resource, its visualization and editing toolb...

متن کامل

SENSEABLE SEARCH: Selective Query Disambiguation

We present a method for detecting and resolving lexical ambiguity in information retrieval queries. Leveraging existing word sense disambiguation tools, we define a measure of query term ambiguity based on the distribution of word senses in the relevant document set. If a query term is ambiguous, we allow the user to select the correct sense of the query term, in the style of Google’s spelling ...

متن کامل

Entity Disambiguation with Linkless Knowledge Bases

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived f...

متن کامل

Entity Recognition and Linking in Chinese Search Queries

Aiming at the task of Entity Recognition and Linking in Chinese Search Queries in NLP&CC 2015, this paper proposes the solutions to entity recognition, entity linking and entity disambiguation. Dictionary, online knowledge base and SWJTU Chinese word segmentation are used in entity recognition. Synonyms thesaurus, redirect of Wikipedia and the combination of improved PED (Pinyin Edit Distance) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012